AITopics | independent policy gradient method

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsFeb-8-2026, 03:45:17 GMT

MinimaxvShapley[63]showed gameG, thereexists( 1, 2)suchthat V ( 1, 2) V ( 1, 2) V ( 1, 2), forall 1, 2, (1) andinparticularV = min 1max 2V ( 1, 2)=max 2min 1V ( 1, 2). Thecruxxplayer timescalethany-player, they-player Compared 43], whichestablishesy-player gradientdominancey-player' ofthegradient t, (y) = ( f(xt, )) (y), then averageusing Is Q-learningprovably Inin Neural Information Processing Systems, pages 4863-4873, 2018.

machine learning, machine learning research, reinforcement learning, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsDec-23-2025, 23:13:45 GMT

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent learning in competitive RL, as prior work has largely focused on centralized/coordinated procedures for equilibrium computation.

competitive reinforcement learning, independent policy gradient method, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsMay-26-2025, 20:57:29 GMT

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent learning in competitive RL, as prior work has largely focused on centralized/coordinated procedures for equilibrium computation.

competitive reinforcement learning, independent policy gradient method, machine learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Review for NeurIPS paper: Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 14:36:19 GMT

Weaknesses: I am not convinced by the main motivation of this paper for decoupled or independent learning. Specifically, from the communication perspective, once agents can also communicate the actions each other took per round, then each agent can also simulate any coupled algorithm locally (or only coupled online algorithm if has storage limitation). Since agents have to communicate with the oracle or environment in each round anyway, I don't see in practice why communicate the actions in the learning process is that problematic. Second, this paper says that the independent learning is important because it allows the algorithm "being versatile, being applicable even in uncertain environments where the type of interaction and number of other agents are not known to the agent. " I feel this description does not fit the algorithm studied in this paper, thus a bit misleading.

agent, competitive reinforcement learning, independent policy gradient method, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 14:36:11 GMT

The reviewers agreed that this is a solid work, on an important problem for which existing results are scarce. However, there were several concerns: - The authors create some confusion in describing their method as "independent" - the agents have to coordinate the learning rates ahead of time. I believe that these concerns actually open the door for interesting followup work, and therefore recommend acceptance. I ask the authors to tone down the independence claims in the final version, given the concern above.

competitive reinforcement learning, independent policy gradient method, neurips paper

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 00:21:55 GMT

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent learning in competitive RL, as prior work has largely focused on centralized/coordinated procedures for equilibrium computation.

competitive reinforcement learning, independent policy gradient method

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback